Learning Gene Functional Classi cations from Multiple Data Types

نویسندگان

  • PAUL PAVLIDIS
  • JASON WESTON
  • JINSONG CAI
چکیده

In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classiŽ cations from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic proŽ les from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. In addition, we describe feature scaling methods for further exploiting prior knowledge of heterogeneity by giving each data type different weights.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HYDRA-MM: Learning Multiple Descriptions to Improve Classi cation Accuracy

For learning tasks with few examples, greater classi cation accuracy can be achieved by learning several concept descriptions for each class in the data and producing a classi cation that combines evidence from multiple descriptions. Stochastic (randomized) search can be used to generate many concept descriptions for each class. Here we use a tractable approximation to the optimal Bayesian meth...

متن کامل

On Learning Multiple Descriptions of a Concept

In sparse data environments, greater classi cation accuracy can be achieved by learning several concept descriptions of the data and combining their classi cations. Stochastic search is a general tool which can be used to generate many good concept descriptions (rule sets) for each class in the data. Bayesian probability theory o ers an optimal strategy for combining classi cations of the indiv...

متن کامل

Hydra-mm: Learning Multiple Descriptions to Improve Classification Accuracy

For learning tasks with few examples, greater classi cation accuracy can be achieved by learning several concept descriptions for each class in the data and producing a classi cation that combines evidence from multiple descriptions. Stochastic (randomized) search can be used to generate many concept descriptions for each class. Here we use a tractable approximation to the optimal Bayesian meth...

متن کامل

Boosting Trees for Cost-Sensitive Classi cations

This paper explores two boosting techniques for cost-sensitive tree classi cations in the situation where misclassi cation costs change very often. Ideally, one would like to have only one induction, and use the induced model for di erent misclassi cation costs. Thus, it demands robustness of the induced model against cost changes. Combining multiple trees gives robust predictions against this ...

متن کامل

Distributed Learning on Very Large Data Sets

One approach to learning from intractably large data sets is to utilize all the training data by learning models on tractably sized subsets of the data. The subsets of data may be disjoint or partially overlapping. The individual learned models may be combined into a single model or a voting approach may be used to combine the classi cations of a set of models. An approach to learning models in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002